skip to main content


Search for: All records

Creators/Authors contains: "Ho, Shirley"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Quasars are bright and unobscured active galactic nuclei (AGN) thought to be powered by the accretion of matter around supermassive black holes at the centers of galaxies. The temporal variability of a quasar’s brightness contains valuable information about its physical properties. The UV/optical variability is thought to be a stochastic process, often represented as a damped random walk described by a stochastic differential equation (SDE). Upcoming wide-field telescopes such as the Rubin Observatory Legacy Survey of Space and Time (LSST) are expected to observe tens of millions of AGN in multiple filters over a ten year period, so there is a need for efficient and automated modeling techniques that can handle the large volume of data. Latent SDEs are machine learning models well suited for modeling quasar variability, as they can explicitly capture the underlying stochastic dynamics. In this work, we adapt latent SDEs to jointly reconstruct multivariate quasar light curves and infer their physical properties such as the black hole mass, inclination angle, and temperature slope. Our model is trained on realistic simulations of LSST ten year quasar light curves, and we demonstrate its ability to reconstruct quasar light curves even in the presence of long seasonal gaps and irregular sampling across different bands, outperforming a multioutput Gaussian process regression baseline. Our method has the potential to provide a deeper understanding of the physical properties of quasars and is applicable to a wide range of other multivariate time series with missing data and irregular sampling.

     
    more » « less
  2. Complex astrophysical systems often exhibit low-scatter relations between observable properties (e.g., luminosity, velocity dispersion, oscillation period). These scaling relations illuminate the underlying physics, and can provide observational tools for estimating masses and distances. Machine learning can provide a fast and systematic way to search for new scaling relations (or for simple extensions to existing relations) in abstract high-dimensional parameter spaces. We use a machine learning tool called symbolic regression (SR), which models patterns in a dataset in the form of analytic equations. We focus on the Sunyaev-Zeldovich flux−cluster mass relation ( Y SZ − M ), the scatter in which affects inference of cosmological parameters from cluster abundance data. Using SR on the data from the IllustrisTNG hydrodynamical simulation, we find a new proxy for cluster mass which combines Y SZ and concentration of ionized gas ( c gas ): M ∝ Y conc 3/5 ≡ Y SZ 3/5 (1 − A c gas ). Y conc reduces the scatter in the predicted M by ∼20 − 30% for large clusters ( M ≳ 10 14 h −1 M ⊙ ), as compared to using just Y SZ . We show that the dependence on c gas is linked to cores of clusters exhibiting larger scatter than their outskirts. Finally, we test Y conc on clusters from CAMELS simulations and show that Y conc is robust against variations in cosmology, subgrid physics, and cosmic variance. Our results and methodology can be useful for accurate multiwavelength cluster mass estimation from upcoming CMB and X-ray surveys like ACT, SO, eROSITA and CMB-S4. 
    more » « less
  3. ABSTRACT

    Feedback from active galactic nuclei (AGNs) and supernovae can affect measurements of integrated Sunyaev–Zeldovich (SZ) flux of haloes (YSZ) from cosmic microwave background (CMB) surveys, and cause its relation with the halo mass (YSZ–M) to deviate from the self-similar power-law prediction of the virial theorem. We perform a comprehensive study of such deviations using CAMELS, a suite of hydrodynamic simulations with extensive variations in feedback prescriptions. We use a combination of two machine learning tools (random forest and symbolic regression) to search for analogues of the Y–M relation which are more robust to feedback processes for low masses ($M\lesssim 10^{14}\, \mathrm{ h}^{-1} \, \mathrm{ M}_\odot$); we find that simply replacing Y → Y(1 + M*/Mgas) in the relation makes it remarkably self-similar. This could serve as a robust multiwavelength mass proxy for low-mass clusters and galaxy groups. Our methodology can also be generally useful to improve the domain of validity of other astrophysical scaling relations. We also forecast that measurements of the Y–M relation could provide per cent level constraints on certain combinations of feedback parameters and/or rule out a major part of the parameter space of supernova and AGN feedback models used in current state-of-the-art hydrodynamic simulations. Our results can be useful for using upcoming SZ surveys (e.g. SO, CMB-S4) and galaxy surveys (e.g. DESI and Rubin) to constrain the nature of baryonic feedback. Finally, we find that the alternative relation, Y–M*, provides complementary information on feedback than Y–M.

     
    more » « less
  4. Abstract A wealth of cosmological and astrophysical information is expected from many ongoing and upcoming large-scale surveys. It is crucial to prepare for these surveys now and develop tools that can efficiently extract most information. We present HIF low : a fast generative model of the neutral hydrogen (H i ) maps that is conditioned only on cosmology (Ω m and σ 8 ) and designed using a class of normalizing flow models, the masked autoregressive flow. HIF low is trained on the state-of-the-art simulations from the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project. HIF low has the ability to generate realistic diverse maps without explicitly incorporating the expected two-dimensional maps structure into the flow as an inductive bias. We find that HIF low is able to reproduce the CAMELS average and standard deviation H i power spectrum within a factor of ≲2, scoring a very high R 2 > 90%. By inverting the flow, HIF low provides a tractable high-dimensional likelihood for efficient parameter inference. We show that the conditional HIF low on cosmology is successfully able to marginalize over astrophysics at the field level, regardless of the stellar and AGN feedback strengths. This new tool represents a first step toward a more powerful parameter inference, maximizing the scientific return of future H i surveys, and opening a new avenue to minimize the loss of complex information due to data compression down to summary statistics. 
    more » « less
  5. Abstract Using the second data release from the Zwicky Transient Facility (ZTF), Chen et al. created a ZTF Catalog of Periodic Variable Stars (ZTF CPVS) of 781,602 periodic variables stars (PVSs) with 11 class labels. Here, we provide a new classification model of PVSs in the ZTF CPVS using a convolutional variational autoencoder and hierarchical random forest. We cross-match the sky-coordinate of PVSs in the ZTF CPVS with those presented in the SIMBAD catalog. We identify non-stellar objects that are not previously classified, including extragalactic objects such as Quasi-Stellar Objects, Active Galactic Nuclei, supernovae and planetary nebulae. We then create a new labeled training set with 13 classes in two levels. We obtain a reasonable level of completeness (≳90%) for certain classes of PVSs, although we have poorer completeness in other classes (∼40% in some cases). Our new labels for the ZTF CPVS are available via Zenodo. 
    more » « less
  6. Abstract Many different studies have shown that a wealth of cosmological information resides on small, nonlinear scales. Unfortunately, there are two challenges to overcome to utilize that information. First, we do not know the optimal estimator that will allow us to retrieve the maximum information. Second, baryonic effects impact that regime significantly and in a poorly understood manner. Ideally, we would like to use an estimator that extracts the maximum cosmological information while marginalizing over baryonic effects. In this work we show that neural networks can achieve that when considering some simple scenarios. We made use of data where the maximum amount of cosmological information is known: power spectra and 2D Gaussian density fields. We also contaminate the data with simplified baryonic effects and train neural networks to predict the value of the cosmological parameters. For this data, we show that neural networks can (1) extract the maximum available cosmological information, (2) marginalize over baryonic effects, and (3) extract cosmological information that is buried in the regime dominated by baryonic physics. We also show that neural networks learn the priors of the data they are trained on, affecting their extrapolation properties. We conclude that a promising strategy to maximize the scientific return of cosmological experiments is to train neural networks on state-of-the-art numerical simulations with different strengths and implementations of baryonic effects. 
    more » « less
  7. Abstract

    Periodic variables illuminate the physical processes of stars throughout their lifetime. Wide-field surveys continue to increase our discovery rates of periodic variable stars. Automated approaches are essential to identify interesting periodic variable stars for multiwavelength and spectroscopic follow-up. Here we present a novel unsupervised machine-learning approach to hunt for anomalous periodic variables using phase-folded light curves presented in the Zwicky Transient Facility Catalogue of Periodic Variable Stars by Chen et al. We use a convolutional variational autoencoder to learn a low-dimensional latent representation, and we search for anomalies within this latent dimension via an isolation forest. We identify anomalies with irregular variability. Most of the top anomalies are likely highly variable red giants or asymptotic giant branch stars concentrated in the Milky Way galactic disk; a fraction of the identified anomalies are more consistent with young stellar objects. Detailed spectroscopic follow-up observations are encouraged to reveal the nature of these anomalies.

     
    more » « less
  8. This data release contains 730,184 periodic transients with the new class labels in a csv file and the cross-match results of periodic variable stars (PVSs) in the ZTF CPVS with the SIMBAD catalog. Classifications and details with this data set are available in Cheung et al. (2021) and Chan et al. (2021)

     
    more » « less